Automatic morphological analysis of Basque
نویسندگان
چکیده
1 I n t r o d u c t i o n The two-level model of computational morphology was proposed by Koskenniemi (1983) and has found widespread acceptance due mostly to its general applicability, declarativeness of rules and clear separation of linguistic knowledge and program. The essential difference from generative phonology is that there are no intermediate states between lexical and surface representations. Word recognition is reduced to finding valid lexical representations which correspond to a given surface form. Inversely, generation proceeds from a known lexical representation and searches for surface representations corresponding to it. The complexity of the model is studied in depth in (Barton, 85) who concludes that the complexity of a language has no significant effects on the speed of analysis or synthesis. The two-level model of morphology has become the most popular formalism for highly inflected and agglutinative languages (Antworth, 90) (Sproat, 92) (Oflazer, 94). The two-level system is based on two main components —see Sproat (1992): • A lexicon where the morphemes (lemmas and affixes) and the possible links among them (morphotactics) are defined. The lexicon is divided into different sublexicons and each lexicon entry specifies its morphotactical information by means of a continuation class which is a set of sublexicons. Combining sublexicons (nodes) and continuation classes (arcs) the graph of morphotactics is defined. • A set of rules which controls the mapping between the lexical level and the surface level due to the morphonological transformations (morphophonemics). There are four kind of rules: context restriction rules " => " (lexical character may be realized as the lexical one in the given context), surface coercion rules " <= " (lexical character must be realized as the lexical one in the given context), composite rules " <=> " (lexical character must be realized as the lexical one in the given context and this change is licit only in this context) and exclusion rules (lexical character may not be realized as the lexical one in the given context). The rules are independent from the morphotactics. The rules are compiled into transducers, so it is possible to apply the system for both analysis and generation. PC-Kimmo (Antworth, 90) is a freely available software tool which is useful to experiment with this formalism. Different flavours of two-level morphology have been developed, most of them changing the continuation class based morphotactics by unification based mechanisms (for instance Ritchie et al., …
منابع مشابه
Automatic Morphological Segmentation for Continuous Speech Recognition of Basque
The selection of appropriate Lexical Units (LUs) is an important issue in the development of Continuous Speech Recognition (CSR) systems. Word has been used classically as unit in most of them. However, proposals of non-word units have begun to arise. Since the subject of this study is the Basque language, which is an agglutinative language with a complex structure inside words, non-word units ...
متن کاملDifferent Issues in the Design of a General-Purpose Lexical Database for Basque
EDBL is a lexical database (LDB) for Basque. This paper presents the design and the main features of this database, conceived as a general lexical basis for the automatic treatment of Basque. The conceptual schema of EDBL is explained by means of Extended ER diagrams and Feature Structures. The implementation of the database in a commercial RDBMS and the problems encountered in this implementat...
متن کاملExtraction of semantic relations from a Basque monolingual dictionary using Constraint Grammar
This paper deals with the exploitation of dictionaries for the semi-automatic construction of lexicons and lexical knowledge bases. The final goal of our research is to enrich the Basque Lexical Database with semantic information such as senses, definitions, semantic relations, etc., extracted from a Basque monolingual dictionary. The work here presented focuses on the extraction of the semanti...
متن کاملLINGUISTIC DESCRIPTION IN DICTIONARIES: SEMANTICS Extraction of semantic relations from a Basque monolingual dictionary using Constraint Grammar
This paper deals with the exploitation of dictionaries for the semi-automatic construction of lexicons and lexical knowledge bases. The final goal of our research is to enrich the Basque Lexical Database with semantic information such as senses, definitions, semantic relations, etc., extracted from a Basque monolingual dictionary. The work here presented focuses on the extraction of the semanti...
متن کاملA Fault Diagnosis Method for Automaton based on Morphological Component Analysis and Ensemble Empirical Mode Decomposition
In the fault diagnosis of automaton, the vibration signal presents non-stationary and non-periodic, which make it difficult to extract the fault features. To solve this problem, an automaton fault diagnosis method based on morphological component analysis (MCA) and ensemble empirical mode decomposition (EEMD) was proposed. Based on the advantages of the morphological component analysis method i...
متن کاملA Fault Diagnosis Method for Automaton Based on Morphological Component Analysis and Ensemble Empirical Mode Decomposition
In the fault diagnosis of automaton, the vibration signal presents non-stationary and non-periodic, which make it difficult to extract the fault features. To solve this problem, an automaton fault diagnosis method based on morphological component analysis (MCA) and ensemble empirical mode decomposition (EEMD) was proposed. Based on the advantages of the morphological component analysis method i...
متن کامل